Skip to content

Conversation

@Prajna1999
Copy link
Collaborator

@Prajna1999 Prajna1999 commented Feb 2, 2026

Summary

Target issue is #528
Intergrate Gemin-pro-2.5-preview-tts models with unified API

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  1. Currently tested with only Hindi and English as the output language.
  2. Ad hoc main function to test tts scripts will be deleted before final push.

Summary by CodeRabbit

New Features

  • Added Google Gemini provider support for speech-to-text and text-to-speech applications
  • Introduced comprehensive LLM call tracking and recording system with metadata
  • Enabled multimodal input/output support for text and audio processing
  • Added audio format conversion utilities (PCM to MP3/OGG)
  • Enhanced configuration versioning with partial updates and type immutability protection

Prajna1999 and others added 28 commits January 20, 2026 21:57
…-stt-new' into feature/unified-api-stt-new

merge upstream
@coderabbitai
Copy link

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

This PR introduces comprehensive LLM call tracking, Google Gemini integration for STT/TTS, and partial config version updates. It adds database schema for logging LLM calls, extends models to support multimodal inputs/outputs, implements Google AI provider with audio handling, refactors config versioning to allow partial updates, and updates services to wire LLM call creation/updates into the job execution flow.

Changes

Cohort / File(s) Summary
Database & Migrations
backend/app/alembic/versions/045_add_llm_call_table.py
Creates new llm_call table with comprehensive metadata (input/output types, provider details, content, usage, conversation tracking, soft-delete support), foreign keys to job/project/organization, and filtered indexes for conversation_id and job_id queries.
LLM Models – Core Types & Payloads
backend/app/models/llm/request.py
Introduces provider-specific parameter models (TextLLMParams, STTLLMParams, TTSLLMParams), discriminated input payloads (TextInput, AudioInput with base64 support), discriminated CompletionConfig union, new LlmCall database model with full schema, and enhanced NativeCompletionConfig/KaapiCompletionConfig with type field and validation.
LLM Models – Response & Output
backend/app/models/llm/response.py
Adds TextOutput and AudioOutput discriminated union types with type discriminators, introduces reasoning_tokens field to Usage, replaces monolithic LLMOutput with discriminated union pattern.
LLM Models – Exports & Organization
backend/app/models/llm/__init__.py, backend/app/models/config/__init__.py, backend/app/models/config/version.py, backend/app/models/__init__.py
Reorganizes LLM/config exports to surface new types; introduces ConfigVersionUpdatePartial for partial update payloads with config_blob and commit_message fields.
Config Version CRUD
backend/app/crud/config/version.py
Implements partial-update flow via create_from_partial_or_raise with deep merge, validates immutable fields (config type), derives defaults from latest version, includes ConfigBlob validation with error reporting.
LLM CRUD Operations
backend/app/crud/llm.py
Introduces create_llm_call, update_llm_call_response, get_llm_call_by_id, and get_llm_calls_by_job_id functions; handles input serialization, provider response tracking, audio content size computation, and conversation context linking.
Audio Processing
backend/app/core/audio_utils.py
New module with convert_pcm_to_mp3 and convert_pcm_to_ogg functions for PCM-to-audio-format conversion with error handling and configurable sample rates.
Langfuse & Observability
backend/app/core/langfuse/langfuse.py
Adds extract_output_value helper to normalize TextOutput and AudioOutput for logging/tracing; handles both text and audio metadata in observe_llm_execution.
Provider System – Core
backend/app/core/providers.py
Adds GOOGLE provider enum member and ProviderConfig with required api_key field.
Provider Implementation
backend/app/services/llm/providers/base.py, backend/app/services/llm/providers/oai.py, backend/app/services/llm/providers/gai.py
BaseProvider extended with create_client abstract method and resolved_input parameter; OpenAIProvider refactored to oai.py with create_client implementation and TextOutput construction; new GoogleAIProvider implements STT/TTS with audio file handling, parameter mapping, format conversion, and comprehensive error handling.
Provider Utilities & Registry
backend/app/services/llm/providers/__init__.py, backend/app/services/llm/providers/registry.py, backend/app/services/llm/providers/PLAN.md
Reorganizes imports (openai→oai), adds GoogleAIProvider export; refactors registry to use get_provider_class, adds GOOGLE_NATIVE constant, implements create_client-based instantiation with error handling; adds Gemini 2.5 Pro TTS spec document.
Parameter Mapping & Config Transformation
backend/app/services/llm/mappers.py
Refactors map_kaapi_to_openai_params to handle dict input and reasoning/knowledge_base parameters; introduces map_kaapi_to_google_params for Gemini; extends transform_kaapi_config_to_native to support google-native provider with type field.
Input Resolution & Job Execution
backend/app/services/llm/input_resolver.py, backend/app/services/llm/jobs.py
New input_resolver module provides get_file_extension, resolve_input (text/audio), resolve_audio_base64, and cleanup_temp_file; jobs.py integrates LLM call creation/updates, input resolution lifecycle, config transformation, response tracking, and audio cleanup in execute_job.
Config API & Evaluation Services
backend/app/api/routes/config/version.py, backend/app/services/evaluations/evaluation.py, backend/app/crud/evaluations/core.py
API endpoint switches to ConfigVersionUpdatePartial for partial config updates; evaluation service adds param validation per config type; evaluations CRUD changes model parameter retrieval to dict-based access.
LLM Service Initialization
backend/app/services/llm/__init__.py
Consolidates provider imports including new GoogleAIProvider.
Dependencies
backend/pyproject.toml
Adds google-genai>=1.59.0 and pydub>=0.25.1 for Google AI SDK and audio processing.
Test Coverage – CRUD & Models
backend/app/tests/crud/test_llm.py, backend/app/tests/crud/config/test_*.py, backend/app/tests/crud/test_credentials.py, backend/app/tests/utils/test_data.py
Comprehensive LLM CRUD tests (create/retrieve/update for text/STT/TTS/stored-config scenarios); config test fixtures and version tests updated with type field and partial-update assertions; test-data utilities refactored to generate configs with type and dict-based params.
Test Coverage – Services & Providers
backend/app/tests/services/llm/test_input_resolver.py, backend/app/tests/services/llm/test_jobs.py, backend/app/tests/services/llm/test_mappers.py, backend/app/tests/services/llm/providers/test_*.py
Extensive provider tests: GoogleAI STT/TTS success/failure paths with mocked responses, audio encoding/conversion validation, input/output type handling; job tests refactored for TextOutput/TextContent and dict-based params; mapper tests cover OpenAI reasoning, knowledge_base, and new Google mapping paths; OpenAI provider tests updated for resolved_input parameter.
Test Coverage – API & Evaluation
backend/app/tests/api/routes/configs/test_*.py, backend/app/tests/api/routes/test_evaluation.py, backend/app/tests/api/routes/test_llm.py
Config API tests verify partial-update semantics and type immutability constraints; evaluation tests replace KaapiLLMParams with TextLLMParams; LLM route tests updated for type field and dict-based params in configs.
Database Utilities
backend/app/tests/seed_data/seed_data.py
Minor documentation update noting cascade-delete behavior for ConfigVersion/Config.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Endpoint
    participant JobService as Job Service
    participant InputResolver as Input Resolver
    participant CRUD as LLM CRUD
    participant Database as Database
    participant Provider as LLM Provider
    
    Client->>API: POST /api/llm/execute
    API->>JobService: start_job(request)
    
    JobService->>InputResolver: resolve_input(query.input)
    InputResolver-->>JobService: resolved_input_path
    
    JobService->>CRUD: create_llm_call(..., request, job_id)
    CRUD->>Database: INSERT llm_call
    Database-->>CRUD: llm_call_id
    CRUD-->>JobService: LlmCall
    
    JobService->>Provider: execute(config, query, resolved_input)
    Provider->>Provider: _execute_[stt|tts](...)
    Provider-->>JobService: LLMCallResponse
    
    JobService->>CRUD: update_llm_call_response(llm_call_id, response)
    CRUD->>Database: UPDATE llm_call
    Database-->>CRUD: updated LlmCall
    
    JobService->>InputResolver: cleanup_temp_file(resolved_input_path)
    InputResolver-->>JobService: cleanup_done
    
    JobService-->>API: job_result
    API-->>Client: response
Loading
sequenceDiagram
    participant Client
    participant API as Config API
    participant VersionCRUD as Version CRUD
    participant Database as Database
    
    Client->>API: POST /api/configs/{id}/versions (partial)
    API->>VersionCRUD: create_from_partial_or_raise(ConfigVersionUpdatePartial)
    
    VersionCRUD->>VersionCRUD: _get_latest_version()
    VersionCRUD->>Database: SELECT ConfigVersion WHERE NOT deleted
    Database-->>VersionCRUD: latest_version
    
    VersionCRUD->>VersionCRUD: _deep_merge(latest.config_blob, partial.config_blob)
    VersionCRUD->>VersionCRUD: _validate_immutable_fields(latest, merged)
    VersionCRUD->>VersionCRUD: ConfigBlob.model_validate(merged)
    
    VersionCRUD->>Database: INSERT ConfigVersion
    Database-->>VersionCRUD: new_version
    VersionCRUD-->>API: ConfigVersion
    API-->>Client: version_response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Unified API: Add support for Kaapi Abstracted LLM Call #498: Modifies Kaapi→native LLM config transformation, KaapiCompletionConfig/NativeCompletionConfig types, and provider registry lookups—directly overlaps with mappers and provider registration in this PR.
  • Evaluation #405: Updates evaluation/batch execution codepath, Alembic migrations, and related models—shares database schema and evaluation service modifications.
  • Evaluation: Use Config Management #477: Modifies evaluation config resolution and model derivation from stored config—related through resolve_model_from_config changes in evaluations CRUD.

Suggested labels

enhancement

Suggested reviewers

  • nishika26
  • vprashrex

Poem

🐰 Hops of joy – a new LLM call table hops in,
Google Gemini joins the provider symphony,
Audio flows, configs blend, partial updates spin,
Tracking every query, every response with glee! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main change: adding Gemini TTS/STT model integration with a unified API, which aligns with the extensive changes throughout the codebase for Google AI provider support and LLM call infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 89.35% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/unified-api-tts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Prajna1999 Prajna1999 force-pushed the feature/unified-api-tts branch from be7f1ad to e7bdf75 Compare February 7, 2026 15:12
@Prajna1999 Prajna1999 force-pushed the feature/unified-api-tts branch from af1d59b to 2cd9519 Compare February 8, 2026 16:47
@Prajna1999 Prajna1999 marked this pull request as ready for review February 8, 2026 17:38
@Prajna1999 Prajna1999 self-assigned this Feb 8, 2026
@Prajna1999 Prajna1999 added the enhancement New feature or request label Feb 8, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
backend/app/tests/api/routes/test_llm.py (1)

145-167: ⚠️ Potential issue | 🟡 Minor

Test may validate the wrong error path.

This test intends to verify that an invalid provider is rejected, but the payload at line 153-155 is missing the now-required type field. With discriminated-union validation on type, the 422 may be triggered by the missing type rather than the invalid provider. The assertion still passes but doesn't test what the docstring claims.

Proposed fix
             "config": {
                 "blob": {
                     "completion": {
                         "provider": "invalid-provider",
+                        "type": "text",
                         "params": {"model": "gpt-4"},
                     }
                 }
             },
backend/app/core/langfuse/langfuse.py (1)

241-243: ⚠️ Potential issue | 🟠 Major

Extract query.input value before passing to Langfuse, or verify serialization is safe.

query.input can be a str, TextInput, or AudioInput (discriminated union). Passing complex TextInput/AudioInput objects directly to Langfuse's trace() and generation() input fields may serialize unexpectedly (lines 241, 250). Either extract a loggable representation similar to the pattern used in crud/llm.py with isinstance checks, or verify Langfuse gracefully handles non-string input.

backend/app/services/llm/jobs.py (2)

156-158: ⚠️ Potential issue | 🟡 Minor

Trailing comma in log message.

The f-string on line 157-158 ends with a trailing comma and space: "...task_id={task_id}, ". Remove the trailing comma.

Proposed fix
-        f"[execute_job] Starting LLM job execution | job_id={job_id}, task_id={task_id}, "
+        f"[execute_job] Starting LLM job execution | job_id={job_id}, task_id={task_id}"

345-374: ⚠️ Potential issue | 🟠 Major

Output guardrails assume TextOutput — will break for audio responses.

Lines 347, 360-362, and 374 access response.response.output.content.value which works for TextOutput but will fail for AudioOutput (where content is AudioContent with base64 audio data, not meaningful text). If guardrails are configured for a TTS response, this path will crash or produce nonsensical results.

Consider gating guardrail application on the output type:

Proposed fix
         if response:
             if output_guardrails:
+                if response.response.output.type != "text":
+                    logger.warning(
+                        f"[execute_job] Output guardrails skipped for non-text output type: {response.response.output.type}"
+                    )
+                else:
-                output_text = response.response.output.content.value
+                    output_text = response.response.output.content.value

(Then indent the rest of the guardrails block under the else.)

🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/045_add_llm_call_table.py`:
- Line 20: Add explicit return type annotations to the alembic migration
functions by updating the function signatures for upgrade() and downgrade() to
include a return type (e.g., change "def upgrade():" and "def downgrade():" to
"def upgrade() -> None:" and "def downgrade() -> None:"); locate both
occurrences (the definitions named upgrade and downgrade in this file, including
the other occurrence around the referenced line ~172) and ensure the signatures
use -> None to satisfy the project type-hinting guideline.

In `@backend/app/core/audio_utils.py`:
- Around line 30-45: The convert_pcm_to_ogg function swallows exceptions without
logging details; update its except block to log the exception and context before
returning. Locate convert_pcm_to_ogg and add or use a module logger (e.g.,
logger = logging.getLogger(__name__)), then replace the bare except handling
with logger.exception("Failed to convert PCM to OGG for sample_rate=%s",
sample_rate) (or similar) so the stack trace and error are recorded, and then
return None, str(e) as before.
- Around line 15-27: The convert_pcm_to_mp3 function should log failures and
include a docstring: add a short docstring matching convert_pcm_to_ogg
describing inputs/outputs, and in the except block call the module logger (use
logger.error or logger.exception) to log the error prefixed with
"[convert_pcm_to_mp3]" and include the exception details before returning (e.g.,
logger.exception(f"[convert_pcm_to_mp3] Failed to convert PCM to MP3: %s", e)).
Ensure you reference the existing logger variable and keep the returned tuple
behavior unchanged.

In `@backend/app/services/llm/jobs.py`:
- Around line 196-212: The block in execute_job that imports select, queries
recent jobs, and logs Recent jobs in DB is debug scaffolding and should be
removed; replace the entire "if not job:" branch so it only logs a concise error
using logger.error with the missing job_id (e.g., logger.error(f"[execute_job]
Job not found | job_id={job_id}")) and avoid importing sqlmodel.select or
iterating over other Job entries; keep the else branch that logs the found job
status intact.
- Around line 170-185: The guardrails handling assumes text payloads by directly
reading/writing request.query.input.content.value (TextInput) which will crash
for AudioInput/AudioContent; update the logic in the block around safe_input
handling to first check the input type (e.g., inspect request.query.input.type
or isinstance against TextInput) and only mutate
request.query.input.content.value for text inputs, otherwise skip guardrail
text-rewrite and skip/reject or return an appropriate APIResponse via
APIResponse.failure_response + handle_job_error for non-text types (or leave
audio untouched), ensuring branches reference the same identifiers (safe_input,
request.query.input, APIResponse.failure_response, handle_job_error,
request.callback_url, job_id).
- Around line 340-343: The cleanup currently compares resolved_input (str) to
request.query.input (QueryInput) which will never be equal and causes cleanup to
run for non-temp text inputs; change the finally block in jobs.py to only call
cleanup_temp_file(resolved_input) when the original input was an audio type or
when you explicitly created a temp file: e.g., check
isinstance(request.query.input, AudioInput) (or use a boolean flag like
temp_file_created set where you generate the temp file) before calling
cleanup_temp_file; reference resolved_input, request.query.input,
AudioInput/TextInput, and cleanup_temp_file to locate and fix the logic.

In `@backend/app/services/llm/mappers.py`:
- Around line 32-39: The call to litellm.supports_reasoning currently uses model
= kaapi_params.get("model") which can be None and causes runtime errors; update
the logic so you only call litellm.supports_reasoning(model=f"openai/{model}")
when model is truthy (e.g., after validating model or inside an if model: block)
and otherwise set support_reasoning to a safe default (False or None) so
downstream code using support_reasoning doesn't crash; locate the usage around
the variables model, support_reasoning and the litellm.supports_reasoning call
and move or guard that call accordingly.

In `@backend/app/services/llm/providers/registry.py`:
- Around line 75-159: Remove the ad-hoc testing block starting at the `if
__name__ == "__main__":` section in registry.py (this whole simulated test that
imports dotenv, reads GEMINI_API_KEY, constructs mock_credentials, calls
LLMProvider.get_provider_class, creates a client/instance, builds
NativeCompletionConfig and QueryParams, executes instance.execute, and writes
audio files) so no debug/test code is left in the module; also replace the
unnecessary f-string f"✅ TTS Success!" with a plain string "✅ TTS Success!" (or
remove the f prefix) to satisfy static analysis.

In `@backend/pyproject.toml`:
- Around line 39-40: The Docker image is missing system-level ffmpeg/libav
needed by pydub (used by convert_pcm_to_mp3 and convert_pcm_to_ogg in gai.py),
so update the Dockerfile's apt-get install line to include ffmpeg (or
libav-tools) so audio conversions succeed at runtime; ensure the package is
added to the same RUN apt-get install step and the image build still cleans up
apt caches as before.
🟡 Minor comments (14)
backend/app/services/evaluations/evaluation.py-145-152 (1)

145-152: ⚠️ Potential issue | 🟡 Minor

Unhandled KeyError if config.completion.type is unexpected.

If config.completion.type holds a value not in the param_models dict, line 151 raises a bare KeyError, which is caught by the generic except on line 169 but produces a cryptic error message. Consider using .get() with an explicit error.

Additionally, evaluations currently only support the OpenAI provider (line 109), but there's no guard against non-text types. An STT/TTS config would pass provider validation yet produce invalid evaluation JSONL for /v1/responses.

Proposed fix
-        model_class = param_models[config.completion.type]
+        model_class = param_models.get(config.completion.type)
+        if model_class is None:
+            raise HTTPException(
+                status_code=422,
+                detail=f"Unsupported completion type for evaluation: '{config.completion.type}'. "
+                       f"Only 'text' is currently supported.",
+            )
         validated_params = model_class.model_validate(config.completion.params)
backend/app/tests/api/routes/configs/test_version.py-477-477 (1)

477-477: ⚠️ Potential issue | 🟡 Minor

Unused import: TextLLMParams.

TextLLMParams is imported but never used in this test function.

-    from app.models.llm.request import KaapiCompletionConfig, TextLLMParams
+    from app.models.llm.request import KaapiCompletionConfig
backend/app/models/llm/response.py-7-8 (1)

7-8: ⚠️ Potential issue | 🟡 Minor

Unused imports: base64 and model_validator.

Neither base64 nor model_validator is referenced anywhere in this file.

-import base64
-from pydantic import model_validator
 from sqlmodel import SQLModel, Field
backend/app/tests/utils/test_data.py-339-358 (1)

339-358: ⚠️ Potential issue | 🟡 Minor

config_type can be None if "type" key is missing from the stored config blob.

Line 342 uses .get("type") which returns None if the key is absent. Passing None to NativeCompletionConfig(type=config_type) or KaapiCompletionConfig(type=config_type) will fail validation since type is Literal["text", "stt", "tts"].

Proposed fix — add a default
-            config_type = completion_config.get("type")
+            config_type = completion_config.get("type", "text")
backend/app/services/llm/providers/registry.py-71-71 (1)

71-71: ⚠️ Potential issue | 🟡 Minor

Log message missing function name prefix.

Per coding guidelines, log messages should be prefixed with the function name in square brackets.

Proposed fix
-        logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)
+        logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)

As per coding guidelines, "Prefix all log messages with the function name in square brackets."

backend/app/services/llm/providers/gai.py-118-118 (1)

118-118: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from f-strings with no placeholders.

Static analysis (Ruff F541) flags lines 118 and 272 as f-strings that contain no interpolated expressions.

Proposed fix
-            f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+            "[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
-            f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
+            "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"

Also applies to: 272-272

backend/app/crud/llm.py-39-47 (1)

39-47: ⚠️ Potential issue | 🟡 Minor

size_bytes records the base64 string length, not the actual audio byte count.

len(query_input.content.value) where value is a base64-encoded string gives the character count of the encoding, which is ~33% larger than the actual audio data. If the intent is to store real audio size, decode first (or multiply by 3/4). If the intent is just to store the payload size, rename to base64_length to avoid confusion with the decoded-size calculation in update_llm_call_response (line 186-187).

backend/app/models/llm/request.py-180-182 (1)

180-182: ⚠️ Potential issue | 🟡 Minor

Stale description: "LLM provider (openai)" should reflect the expanded provider set.

The field description still says (openai) but the Literal now includes "google" as well.

Proposed fix
     provider: Literal["openai", "google"] = Field(
-        ..., description="LLM provider (openai)"
+        ..., description="LLM provider (openai, google)"
     )
backend/app/crud/llm.py-78-80 (1)

78-80: ⚠️ Potential issue | 🟡 Minor

getattr fallback on a dict will never resolve "type" correctly.

completion_config.params is dict[str, Any] (per both NativeCompletionConfig and KaapiCompletionConfig). getattr(a_dict, "type", "text") checks for a Python attribute named type on the dict object, not a dict key — it would always return "text" regardless of the dict's contents. Per a retrieved learning, the type field on completion_config is mandatory, so completion_config.type should always be set and this fallback is effectively dead code. If the fallback is still desired for safety, use .get().

Proposed fix
-    completion_type = completion_config.type or getattr(
-        completion_config.params, "type", "text"
-    )
+    completion_type = completion_config.type

Based on learnings: "the type field in config_blob.completion (for both NativeCompletionConfig and KaapiCompletionConfig) is compulsory/mandatory during config creation and validation ensures it will never be missing."

backend/app/tests/crud/test_llm.py-42-46 (1)

42-46: ⚠️ Potential issue | 🟡 Minor

Missing return type annotation on test_job fixture.

All other fixtures and test functions include return type hints. This fixture is missing one.

As per coding guidelines: "Always add type hints to all function parameters and return values in Python code."

Proposed fix
 `@pytest.fixture`
-def test_job(db: Session):
+def test_job(db: Session) -> Any:
     """Create a test job for LLM call tests."""
     crud = JobCrud(db)
     return crud.create(job_type=JobType.LLM_API, trace_id="test-llm-trace")

Replace Any with the actual Job type if available — likely from app.models import Job would be the proper return type.

backend/app/services/llm/mappers.py-98-102 (1)

98-102: ⚠️ Potential issue | 🟡 Minor

model is unconditionally set (possibly to None) unlike the OpenAI mapper which guards with if model.

Line 102 sets google_params["model"] = kaapi_params.get("model") which could store None. The OpenAI mapper (line 61-62) only sets model when truthy. Consider aligning the behavior or adding a guard.

Proposed fix
-    # Model is present in all param types
-    google_params["model"] = kaapi_params.get("model")
+    model = kaapi_params.get("model")
+    if not model:
+        raise ValueError("Missing required 'model' parameter in kaapi_params")
+    google_params["model"] = model
backend/app/crud/llm.py-95-99 (1)

95-99: ⚠️ Potential issue | 🟡 Minor

hasattr on a dict won't find the "model" key — always falls to the else branch.

completion_config.params is a plain dict. hasattr(dict_instance, "model") checks for a Python attribute, not a dict key, so it will always be False. The else branch with .get("model", "") is the one that actually executes. Simplify to just the .get() call.

Proposed fix
-    model = (
-        completion_config.params.model
-        if hasattr(completion_config.params, "model")
-        else completion_config.params.get("model", "")
-    )
+    model = completion_config.params.get("model", "")
backend/app/models/llm/request.py-9-11 (1)

9-11: ⚠️ Potential issue | 🟡 Minor

Duplicate imports: Field and SQLModel are imported twice.

Line 4 imports Field, SQLModel from sqlmodel, and line 11 reimports them along with Index, text. Ruff (F811) correctly flags this. Consolidate into a single import.

Proposed fix
-from sqlmodel import Field, SQLModel
 from pydantic import Discriminator, model_validator, HttpUrl
 from datetime import datetime
 from app.core.util import now
 
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import JSONB
-from sqlmodel import Field, SQLModel, Index, text
+from sqlmodel import Field, SQLModel, Index, text
backend/app/services/llm/providers/gai.py-230-257 (1)

230-257: ⚠️ Potential issue | 🟡 Minor

Fix misleading comment about Gemini TTS output format.

Line 231 incorrectly states the output is "PCM (WAV format)" — Gemini TTS returns raw PCM bytes without a WAV container. The WAV header is added client-side if needed. Change the comment to clarify: "Gemini TTS natively outputs 24kHz 16-bit PCM (raw audio data, no WAV header)". The conversion logic itself is correct and properly handles raw PCM input.

🧹 Nitpick comments (23)
backend/app/tests/seed_data/seed_data.py (1)

90-97: Consider updating log message prefixes to use individual function names.

All log messages currently use the generic [tests.seed_data] prefix. This makes it harder to filter logs by specific functions during debugging. As per coding guidelines, each function should prefix logs with its own name: [load_seed_data], [create_organization], [clear_database], etc.

Example fix for load_seed_data
     except FileNotFoundError:
         logging.error(
-            f"[tests.seed_data]Error: Seed data file not found at {json_path}"
+            f"[load_seed_data] Error: Seed data file not found at {json_path}"
         )
         raise
     except json.JSONDecodeError as e:
         logging.error(
-            f"[tests.seed_data]Error: Failed to decode JSON from {json_path}: {e}"
+            f"[load_seed_data] Error: Failed to decode JSON from {json_path}: {e}"
         )

As per coding guidelines: "Prefix all log messages with the function name in square brackets."

Also applies to: 110-110, 137-137, 157-157, 212-212, 254-254, 293-293, 344-344, 359-359, 389-389, 418-418, 438-438, 479-479, 481-481

backend/app/services/llm/input_resolver.py (1)

33-33: Remove or replace the # important!! comment.

This comment doesn't convey any useful information. Either remove it or replace with a meaningful note about why this function is critical.

Proposed fix
-# important!!
 def resolve_input(query_input: QueryInput) -> tuple[str, str | None]:
backend/app/tests/services/llm/providers/test_registry.py (1)

23-26: Consider adding a test for the google-native registry entry.

test_registry_contains_openai explicitly asserts "openai-native" is in the registry and maps to OpenAIProvider. With the new google-native provider, a parallel assertion for GoogleAIProvider would prevent regressions on the new provider wiring.

backend/app/tests/api/routes/configs/test_config.py (1)

49-59: Consider asserting max_tokens is stripped.

The comment on line 49 states that "invalid fields like max_tokens are stripped," but the assertions only verify that valid fields are present. Adding a negative assertion (e.g., assert "max_tokens" not in ...params) would strengthen this test.

backend/app/models/config/version.py (1)

99-116: Missing empty config_blob validation.

ConfigVersionBase validates that config_blob is non-empty via validate_blob_not_empty, but ConfigVersionUpdatePartial doesn't inherit from it and lacks this check. A request with config_blob: {} would pass model validation. Consider adding the same validator or inheriting it.

Proposed fix
+from pydantic import field_validator
+
 class ConfigVersionUpdatePartial(SQLModel):
     ...
 
     config_blob: dict[str, Any] = Field(
         description="Partial config blob. Only include fields you want to update. "
         "Provider and type are inherited from existing config and cannot be changed.",
     )
     commit_message: str | None = Field(
         default=None,
         max_length=512,
         description="Optional message describing the changes in this version",
     )
+
+    `@field_validator`("config_blob")
+    def validate_blob_not_empty(cls, value: dict[str, Any]) -> dict[str, Any]:
+        if not value:
+            raise ValueError("config_blob cannot be empty")
+        return value
backend/app/api/routes/config/version.py (1)

25-30: Missing return type annotation on create_version.

As per coding guidelines, all function parameters and return values should have type hints.

Proposed fix
 def create_version(
     config_id: UUID,
     version_create: ConfigVersionUpdatePartial,
     current_user: AuthContextDep,
     session: SessionDep,
-):
+) -> APIResponse[ConfigVersionPublic]:
backend/app/services/llm/providers/PLAN.md (1)

1-88: Consider whether this planning document should be committed to the repository.

This file reads like an internal implementation specification / prompt for development. If it's meant to be permanent documentation, it should live in a docs/ directory with proper formatting. If it was used only during development, consider removing it before merge.

A few text issues if keeping it:

  • Line 84: thorughthrough
  • Line 84: providers.gai.pyproviders/gai.py (wrong separator)
  • Line 88: over abstractedover-abstracted
backend/app/tests/api/routes/configs/test_version.py (1)

522-620: Inconsistent assertion depth across type-change tests.

test_create_version_cannot_change_type_from_text_to_stt (Line 516-519) verifies the error message content ("cannot change config type", "text", "stt"), but the stt→tts and tts→text variants only assert status_code == 400 without checking the error detail. Consider adding the same error message assertions for consistency and to catch regressions in error messaging.

backend/app/alembic/versions/045_add_llm_call_table.py (2)

161-168: Redundant alter_column on a table created in the same migration.

Lines 22-138 create llm_call with provider comment set to "AI provider: openai, google, anthropic", then lines 161-168 immediately alter it to a different comment. Set the correct comment directly in create_table and remove this alter_column. Similarly, the downgrade (lines 182-189) alters this column's comment right before dropping the table — also unnecessary.

Proposed fix in create_table
         sa.Column(
             "provider",
             sa.String(),
             nullable=False,
-            comment="AI provider: openai, google, anthropic",
+            comment="AI provider as sent by user (e.g openai, -native, google)",
         ),

Then remove the alter_column blocks for llm_call.provider in both upgrade() and downgrade().


22-152: Consider adding an index on project_id for query performance.

The table has foreign keys on job_id, project_id, and organization_id, but only job_id and conversation_id are indexed. If LLM calls are frequently queried by project (e.g., listing calls for a project), a partial index on project_id (filtered by deleted_at IS NULL) would improve performance.

backend/app/tests/services/llm/test_mappers.py (2)

9-15: TTSLLMParams is imported but unused in this test file.

No tests exercise TTS-specific parameter mapping for Google (e.g., voice, language fields). Given this PR introduces Gemini TTS, consider adding a test using TTSLLMParams to verify map_kaapi_to_google_params correctly maps voice and language parameters.

Example test for TTS param mapping
def test_tts_params_mapping(self):
    """Test TTS-specific parameter mapping (voice, language)."""
    kaapi_params = TTSLLMParams(
        model="gemini-2.5-pro-preview-tts",
        voice="Kore",
        language="hi-IN",
    )

    result, warnings = map_kaapi_to_google_params(
        kaapi_params.model_dump(exclude_none=True)
    )

    assert result["model"] == "gemini-2.5-pro-preview-tts"
    assert result["voice"] == "Kore"
    assert result["language"] == "hi-IN"
    assert warnings == []

457-496: Consider adding a TTS-specific transform test for Google.

The Google transform tests cover type="stt" (line 461) and type="text" (line 482) but not type="tts", which is the primary use case this PR introduces.

backend/app/tests/services/llm/test_jobs.py (1)

18-22: Remove the commented-out import.

Line 20 has # KaapiLLMParams left as a comment. Clean up dead code.

Proposed fix
     TextOutput,
     TextContent,
-    # KaapiLLMParams,
     KaapiCompletionConfig,
 )
backend/app/services/llm/jobs.py (1)

253-255: Remove else: pass — it's dead code.

Proposed fix
                     request.request_metadata.setdefault("warnings", []).extend(warnings)
-                else:
-                    pass
             except Exception as e:
backend/app/tests/utils/test_data.py (1)

23-23: Remove unused KaapiLLMParams import.

The code constructs configs with dict-based params directly. KaapiLLMParams is imported but never referenced in the file.

Proposed fix
-from app.models.llm import KaapiLLMParams, KaapiCompletionConfig, NativeCompletionConfig
+from app.models.llm import KaapiCompletionConfig, NativeCompletionConfig
backend/app/crud/llm.py (1)

10-17: Nit: group stdlib imports together.

base64 and json (stdlib) are separated from logging, UUID, etc. by third-party imports. Minor readability nit.

Proposed grouping
 import logging
+import base64
+import json
 from typing import Any, Literal
-
 from uuid import UUID
+
 from sqlmodel import Session, select
 from app.core.util import now
-import base64
-import json
 from app.models.llm import LlmCall, LLMCallRequest, ConfigBlob
backend/app/crud/config/version.py (2)

291-341: _validate_config_type_unchanged duplicates _get_latest_version.

Lines 300-311 replicate the same query that _get_latest_version() (lines 155-168) already encapsulates. Consider reusing _get_latest_version() here.

Proposed refactor
     def _validate_config_type_unchanged(
         self, version_create: ConfigVersionCreate
     ) -> None:
-        # Get the latest version
-        stmt = (
-            select(ConfigVersion)
-            .where(
-                and_(
-                    ConfigVersion.config_id == self.config_id,
-                    ConfigVersion.deleted_at.is_(None),
-                )
-            )
-            .order_by(ConfigVersion.version.desc())
-            .limit(1)
-        )
-        latest_version = self.session.exec(stmt).first()
+        latest_version = self._get_latest_version()
 
         # If this is the first version, no validation needed
         if latest_version is None:
             return

34-76: Both create_or_raise and create_from_partial_or_raise share identical version-creation boilerplate.

The try/except block that calls _get_next_version, constructs a ConfigVersion, commits, and handles errors is duplicated across both methods (lines 45-76 and 122-153). Consider extracting a private _persist_version(config_blob_dict, commit_message) helper.

Also applies to: 78-153

backend/app/services/llm/mappers.py (1)

42-50: Temperature is always suppressed when the model supports reasoning, even if no reasoning param was provided.

If a user sends only temperature (no reasoning param) for a reasoning-capable model, the temperature is silently dropped with a warning. This may be intentional per OpenAI API constraints, but it could surprise users who didn't opt into reasoning. Consider only suppressing temperature when reasoning is actually set.

backend/app/services/llm/providers/gai.py (4)

196-198: Move from google.genai import types to module-level imports.

Inline imports inside a method body can hide dependencies and incur repeated import overhead on each call. Since google.genai is already imported at the module level (line 3), move this import to the top of the file.

Proposed fix

At the top of the file:

 from google import genai
 from google.genai.types import GenerateContentResponse
+from google.genai import types

Inside _execute_tts, remove:

-        from google.genai import types

25-32: Redundant self.client = client — already set by super().__init__(client).

BaseProvider.__init__ already assigns self.client = client (per the base class snippet). The explicit reassignment on line 32 is unnecessary.

Proposed fix
     def __init__(self, client: genai.Client):
         super().__init__(client)
-        self.client = client

340-350: Broad except Exception swallows the original error message from the caller.

Line 346 sets error_message = "Unexpected error occurred" discarding the actual exception text. While the detailed error is logged, the caller only sees a generic message. This makes debugging harder for API consumers. Consider including a sanitized error summary.


91-97: Uploaded file resource is never cleaned up.

self.client.files.upload(file=resolved_input) uploads the audio file to Google's storage, but there's no finally block or cleanup call (e.g., self.client.files.delete(name=gemini_file.name)) after the transcription completes. Over time this could accumulate orphaned files in the Google AI storage.

depends_on = None


def upgrade():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add return type annotations to upgrade() and downgrade().

Per coding guidelines, all functions should have type hints.

-def upgrade():
+def upgrade() -> None:
-def downgrade():
+def downgrade() -> None:

As per coding guidelines: "Always add type hints to all function parameters and return values in Python code."

Also applies to: 172-172

🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/045_add_llm_call_table.py` at line 20, Add
explicit return type annotations to the alembic migration functions by updating
the function signatures for upgrade() and downgrade() to include a return type
(e.g., change "def upgrade():" and "def downgrade():" to "def upgrade() ->
None:" and "def downgrade() -> None:"); locate both occurrences (the definitions
named upgrade and downgrade in this file, including the other occurrence around
the referenced line ~172) and ensure the signatures use -> None to satisfy the
project type-hinting guideline.

Comment on lines +15 to +27
def convert_pcm_to_mp3(
pcm_bytes: bytes, sample_rate: int = 24000
) -> tuple[bytes | None, str | None]:
try:
audio = AudioSegment(
data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
)

output_buffer = io.BytesIO()
audio.export(output_buffer, format="mp3", bitrate="192k")
return output_buffer.getvalue(), None
except Exception as e:
return None, str(e)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Log errors before returning them.

The logger is initialized (Line 12) but never used. When conversion fails, the exception is silently returned as a string. Per coding guidelines, log messages should be prefixed with the function name.

Also, convert_pcm_to_mp3 is missing its docstring (unlike convert_pcm_to_ogg).

Proposed fix
 def convert_pcm_to_mp3(
     pcm_bytes: bytes, sample_rate: int = 24000
 ) -> tuple[bytes | None, str | None]:
+    """Convert raw PCM to MP3 format."""
     try:
         audio = AudioSegment(
             data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
         )
 
         output_buffer = io.BytesIO()
         audio.export(output_buffer, format="mp3", bitrate="192k")
         return output_buffer.getvalue(), None
     except Exception as e:
+        logger.error(f"[convert_pcm_to_mp3] Failed to convert PCM to MP3: {e}")
         return None, str(e)

As per coding guidelines, prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message").

🤖 Prompt for AI Agents
In `@backend/app/core/audio_utils.py` around lines 15 - 27, The convert_pcm_to_mp3
function should log failures and include a docstring: add a short docstring
matching convert_pcm_to_ogg describing inputs/outputs, and in the except block
call the module logger (use logger.error or logger.exception) to log the error
prefixed with "[convert_pcm_to_mp3]" and include the exception details before
returning (e.g., logger.exception(f"[convert_pcm_to_mp3] Failed to convert PCM
to MP3: %s", e)). Ensure you reference the existing logger variable and keep the
returned tuple behavior unchanged.

Comment on lines +30 to +45
def convert_pcm_to_ogg(
pcm_bytes: bytes, sample_rate: int = 24000
) -> tuple[bytes | None, str | None]:
"""Convert raw PCM to OGG with Opus codec."""
try:
audio = AudioSegment(
data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
)

output_buffer = io.BytesIO()
audio.export(
output_buffer, format="ogg", codec="libopus", parameters=["-b:a", "64k"]
)
return output_buffer.getvalue(), None
except Exception as e:
return None, str(e)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Same logging gap in convert_pcm_to_ogg.

Proposed fix
     except Exception as e:
+        logger.error(f"[convert_pcm_to_ogg] Failed to convert PCM to OGG: {e}")
         return None, str(e)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def convert_pcm_to_ogg(
pcm_bytes: bytes, sample_rate: int = 24000
) -> tuple[bytes | None, str | None]:
"""Convert raw PCM to OGG with Opus codec."""
try:
audio = AudioSegment(
data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
)
output_buffer = io.BytesIO()
audio.export(
output_buffer, format="ogg", codec="libopus", parameters=["-b:a", "64k"]
)
return output_buffer.getvalue(), None
except Exception as e:
return None, str(e)
def convert_pcm_to_ogg(
pcm_bytes: bytes, sample_rate: int = 24000
) -> tuple[bytes | None, str | None]:
"""Convert raw PCM to OGG with Opus codec."""
try:
audio = AudioSegment(
data=pcm_bytes, sample_width=2, frame_rate=sample_rate, channels=1
)
output_buffer = io.BytesIO()
audio.export(
output_buffer, format="ogg", codec="libopus", parameters=["-b:a", "64k"]
)
return output_buffer.getvalue(), None
except Exception as e:
logger.error(f"[convert_pcm_to_ogg] Failed to convert PCM to OGG: {e}")
return None, str(e)
🤖 Prompt for AI Agents
In `@backend/app/core/audio_utils.py` around lines 30 - 45, The convert_pcm_to_ogg
function swallows exceptions without logging details; update its except block to
log the exception and context before returning. Locate convert_pcm_to_ogg and
add or use a module logger (e.g., logger = logging.getLogger(__name__)), then
replace the bare except handling with logger.exception("Failed to convert PCM to
OGG for sample_rate=%s", sample_rate) (or similar) so the stack trace and error
are recorded, and then return None, str(e) as before.

Comment on lines 170 to +185

elif safe_input["success"]:
request.query.input = safe_input["data"]["safe_text"]
# Update the text value within the QueryInput structure
request.query.input.content.value = safe_input["data"]["safe_text"]

if safe_input["data"]["rephrase_needed"]:
callback_response = APIResponse.failure_response(
error=request.query.input,
error=safe_input["data"]["safe_text"],
metadata=request.request_metadata,
)
return handle_job_error(
job_id, request.callback_url, callback_response
)
else:
request.query.input = safe_input["error"]
# Update the text value with error message
request.query.input.content.value = safe_input["error"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Input guardrails assume TextInput — will crash for AudioInput.

Lines 173 and 185 access request.query.input.content.value, which assumes TextInput. For AudioInput, the .content is AudioContent (base64 data), not text. While it's unlikely guardrails would be configured for audio inputs today, there's no guard preventing it.

Consider adding a type check or skipping guardrails for non-text inputs:

Proposed guard
         if input_guardrails:
+            from app.models.llm.request import TextInput
+            if not isinstance(request.query.input, TextInput):
+                logger.warning("[execute_job] Input guardrails skipped for non-text input type")
+            else:
-            safe_input = call_guardrails(input_query, input_guardrails, job_id)
+                safe_input = call_guardrails(input_query, input_guardrails, job_id)

(Then indent the remaining guardrail handling under the else.)

🤖 Prompt for AI Agents
In `@backend/app/services/llm/jobs.py` around lines 170 - 185, The guardrails
handling assumes text payloads by directly reading/writing
request.query.input.content.value (TextInput) which will crash for
AudioInput/AudioContent; update the logic in the block around safe_input
handling to first check the input type (e.g., inspect request.query.input.type
or isinstance against TextInput) and only mutate
request.query.input.content.value for text inputs, otherwise skip guardrail
text-rewrite and skip/reject or return an appropriate APIResponse via
APIResponse.failure_response + handle_job_error for non-text types (or leave
audio untouched), ensuring branches reference the same identifiers (safe_input,
request.query.input, APIResponse.failure_response, handle_job_error,
request.callback_url, job_id).

Comment on lines +196 to +212
logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}")
job = session.get(Job, job_id)
if not job:
# Log all jobs to see what's in the database
from sqlmodel import select

all_jobs = session.exec(
select(Job).order_by(Job.created_at.desc()).limit(5)
).all()
logger.error(
f"[execute_job] Job not found! | job_id={job_id} | "
f"Recent jobs in DB: {[(j.id, j.status) for j in all_jobs]}"
)
else:
logger.info(
f"[execute_job] Found job | job_id={job_id}, status={job.status}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove debug diagnostic code before merging.

This block queries recent jobs and logs their IDs on every "job not found" case. This is debug scaffolding (inline from sqlmodel import select, querying unrelated jobs) that shouldn't be in production. A simple error log on the missing job_id is sufficient.

Proposed simplification
             logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}")
             job = session.get(Job, job_id)
             if not job:
-                # Log all jobs to see what's in the database
-                from sqlmodel import select
-
-                all_jobs = session.exec(
-                    select(Job).order_by(Job.created_at.desc()).limit(5)
-                ).all()
                 logger.error(
-                    f"[execute_job] Job not found! | job_id={job_id} | "
-                    f"Recent jobs in DB: {[(j.id, j.status) for j in all_jobs]}"
+                    f"[execute_job] Job not found! | job_id={job_id}"
                 )
             else:
🤖 Prompt for AI Agents
In `@backend/app/services/llm/jobs.py` around lines 196 - 212, The block in
execute_job that imports select, queries recent jobs, and logs Recent jobs in DB
is debug scaffolding and should be removed; replace the entire "if not job:"
branch so it only logs a concise error using logger.error with the missing
job_id (e.g., logger.error(f"[execute_job] Job not found | job_id={job_id}"))
and avoid importing sqlmodel.select or iterating over other Job entries; keep
the else branch that logs the found job status intact.

Comment on lines +340 to +343
finally:
# Clean up temp files for audio inputs
if resolved_input and resolved_input != request.query.input:
cleanup_temp_file(resolved_input)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Temp file cleanup condition compares str to QueryInput object — always true, may delete unrelated files.

resolved_input is a str, while request.query.input is a QueryInput (either TextInput or AudioInput — a Pydantic model). They will never be equal, so cleanup_temp_file(resolved_input) executes for every request, including text inputs where resolved_input is the user's text — not a temp file path. With missing_ok=True in cleanup_temp_file, this silently attempts to unlink a path matching the user text, which is unintended and potentially destructive.

Proposed fix — check input type instead
         finally:
             # Clean up temp files for audio inputs
-            if resolved_input and resolved_input != request.query.input:
+            from app.models.llm.request import AudioInput
+            if resolved_input and isinstance(request.query.input, AudioInput):
                 cleanup_temp_file(resolved_input)
🤖 Prompt for AI Agents
In `@backend/app/services/llm/jobs.py` around lines 340 - 343, The cleanup
currently compares resolved_input (str) to request.query.input (QueryInput)
which will never be equal and causes cleanup to run for non-temp text inputs;
change the finally block in jobs.py to only call
cleanup_temp_file(resolved_input) when the original input was an audio type or
when you explicitly created a temp file: e.g., check
isinstance(request.query.input, AudioInput) (or use a boolean flag like
temp_file_created set where you generate the temp file) before calling
cleanup_temp_file; reference resolved_input, request.query.input,
AudioInput/TextInput, and cleanup_temp_file to locate and fix the logic.

Comment on lines +32 to +39
model = kaapi_params.get("model")
reasoning = kaapi_params.get("reasoning")
temperature = kaapi_params.get("temperature")
instructions = kaapi_params.get("instructions")
knowledge_base_ids = kaapi_params.get("knowledge_base_ids")
max_num_results = kaapi_params.get("max_num_results")

support_reasoning = litellm.supports_reasoning(model=f"openai/{model}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the mappers.py file to see the full context
cat -n backend/app/services/llm/mappers.py | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4282


🏁 Script executed:

# Check if there's type hint information on the function parameter
ast-grep --pattern 'function map_kaapi_to_openai_params($$$) {
  $$$
}'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

# Also search for how kaapi_params is defined/documented
rg "kaapi_params" backend/app/services/llm/mappers.py -B 3 -A 3 | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1645


🏁 Script executed:

# Search for type hints on the function signature
rg "def map_kaapi_to_openai_params" backend/app/services/llm/ -A 5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 580


🌐 Web query:

litellm supports_reasoning function behavior with None model parameter

💡 Result:

litellm.supports_reasoning(...) is designed to be called with a model name string (it relies on LiteLLM’s “model info” utilities, which take model: str). If you pass model=None, it’s not a supported input and (in practice) you should expect a Python runtime error (e.g., TypeError / AttributeError from trying to do string operations on None) rather than a meaningful True/False result. [1]

If you need “safe” behavior, wrap it like:

import litellm

def safe_supports_reasoning(model):
    return bool(model) and litellm.supports_reasoning(model)

Notes:

  • In LiteLLM model metadata, supports_reasoning is a boolean capability flag and generally defaults to False when missing/None in structured model-info objects—but that’s after you have a valid model record, not when the model argument itself is None. [2]

Sources:
[1] ReadmeX “Utility Functions (litellm/utils.py)” (describes model-info utilities taking model: str) https://readmex.com/en-US/BerriAI/litellm/page-3d4d91461-decc-4cb6-a010-ba823069a2b8
[2] ModelGroupInfo schema showing supports_reasoning defaulting to False and normalizing None bools to False https://huggingface.co/spaces/Shyamnath/inferencing-llm/blob/c40c75ae1fe76d0d34f6eb6fe5fdc1c4f26dab0e/litellm/types/router.py


litellm.supports_reasoning will fail at runtime when "model" key is absent from kaapi_params.

Line 39 calls litellm.supports_reasoning(model=f"openai/{model}") where model is obtained via .get("model") and can be None. This results in passing "openai/None" to litellm, which will raise a Python runtime error (TypeError or AttributeError) since litellm expects a valid model name string and performs string operations on the input. The validation check if model: at line 61 occurs after this call, too late to prevent the error.

Consider validating that model is present before calling litellm.supports_reasoning.

🤖 Prompt for AI Agents
In `@backend/app/services/llm/mappers.py` around lines 32 - 39, The call to
litellm.supports_reasoning currently uses model = kaapi_params.get("model")
which can be None and causes runtime errors; update the logic so you only call
litellm.supports_reasoning(model=f"openai/{model}") when model is truthy (e.g.,
after validating model or inside an if model: block) and otherwise set
support_reasoning to a safe default (False or None) so downstream code using
support_reasoning doesn't crash; locate the usage around the variables model,
support_reasoning and the litellm.supports_reasoning call and move or guard that
call accordingly.

Comment on lines +75 to +159
# ad hoc testing code
if __name__ == "__main__":
import os
import base64
from pathlib import Path
from dotenv import load_dotenv
from app.models.llm import NativeCompletionConfig, QueryParams

load_dotenv()

# 1. Simulate environment/credentials
GEMINI_KEY = os.getenv("GEMINI_API_KEY")
if not GEMINI_KEY:
print("Set GEMINI_API_KEY environment variable first.")
exit(1)

# This dictionary mimics what get_provider_credential would return from the DB
mock_credentials = {"api_key": GEMINI_KEY}

# 2. Idiomatic Initialization via Registry
provider_type = "google-native"

print(f"Initializing provider: {provider_type}...")

# This block mimics the core logic of your get_llm_provider function
ProviderClass = LLMProvider.get_provider_class(provider_type)
client = ProviderClass.create_client(credentials=mock_credentials)
instance = ProviderClass(client=client)

# 3. Setup TTS Config and Query
test_config = NativeCompletionConfig(
provider="google-native",
type="tts",
params={
"model": "gemini-2.5-pro-preview-tts",
"voice": "Kore",
"language": "hi-IN",
"response_format": "mp3",
},
)

test_text = (
"हाँ आपको एक बार डॉक्यूमेंट जमा करने के बाद पाँच से सात दिन का टाइम लगता है"
)
test_query = QueryParams(input=test_text)

# 4. Execution
print(f"Executing TTS for text: '{test_text}'...")
result, error = instance.execute(
completion_config=test_config, query=test_query, resolved_input=test_text
)

if error:
print(f"❌ Error: {error}")
else:
logger.error(
f"[get_llm_provider] Unsupported provider type requested: {provider_type}"
)
raise ValueError(f"Provider '{provider_type}' is not supported.")
print(f"✅ TTS Success!")
print(f" Provider Response ID: {result.response.provider_response_id}")
print(f" Model: {result.response.model}")
print(f" Usage: {result.usage.total_tokens} tokens")

# 5. Save audio to file
output_dir = Path("speech_output")
output_dir.mkdir(exist_ok=True)

# Get audio data (already base64 encoded by LLMOutput validator)
if result.response.output.content.value:
audio_bytes = base64.b64decode(result.response.output.content.value)

# Use the actual format from the response
mime = result.response.output.content.mime_type or "audio/wav"
audio_format = mime.split("/")[-1] if "/" in mime else mime
output_file = (
output_dir
/ f"tts_test_{result.response.provider_response_id[:8]}.{audio_format}"
)

with open(output_file, "wb") as f:
f.write(audio_bytes)

return provider_class(client=client)
print(f" Audio saved to: {output_file}")
print(f" Audio size: {len(audio_bytes)} bytes")
print(f" Audio format: {audio_format.upper()}")
print(f"\n🎵 Play the audio with: afplay {output_file}")
else:
print(" ⚠️ No audio data in response")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove ad-hoc testing block before merging.

The if __name__ == "__main__": block (lines 75–159) is debug/testing code. The PR description acknowledges this will be removed, but flagging to ensure it doesn't slip through. It also contains an f-string without placeholders on line 130 (f"✅ TTS Success!"), as noted by static analysis.

🧰 Tools
🪛 Ruff (0.14.14)

[error] 130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In `@backend/app/services/llm/providers/registry.py` around lines 75 - 159, Remove
the ad-hoc testing block starting at the `if __name__ == "__main__":` section in
registry.py (this whole simulated test that imports dotenv, reads
GEMINI_API_KEY, constructs mock_credentials, calls
LLMProvider.get_provider_class, creates a client/instance, builds
NativeCompletionConfig and QueryParams, executes instance.execute, and writes
audio files) so no debug/test code is left in the module; also replace the
unnecessary f-string f"✅ TTS Success!" with a plain string "✅ TTS Success!" (or
remove the f prefix) to satisfy static analysis.

Comment on lines +39 to +40
"google-genai>=1.59.0",
"pydub>=0.25.1",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

google-genai 1.59.0 pypi package

💡 Result:

google-genai (Google Gen AI Python SDK) version 1.59.0 is published on PyPI and was released on Jan 15, 2026. The PyPI release history also shows newer versions (e.g., 1.62.0 on Feb 4, 2026). [1]

Install exactly 1.59.0:

pip install google-genai==1.59.0

Package page / release history (and docs link shown there): [1]


🏁 Script executed:

# Find gai.py file and check what it imports from google.genai
find . -name "gai.py" -o -name "*gai*" -type f | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 173


🏁 Script executed:

# Search for imports from google.genai in the codebase
rg "from google\.genai|import google\.genai" -A 5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1384


🏁 Script executed:

# Check for Dockerfile or deployment configuration files
find . -name "Dockerfile*" -o -name "docker-compose*" -o -name "requirements*" | head -10

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 116


🏁 Script executed:

cat ./backend/Dockerfile

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1551


🏁 Script executed:

cat -n ./backend/app/services/llm/providers/gai.py | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4364


Add ffmpeg to Dockerfile system dependencies.

The google-genai>=1.59.0 version is available on PyPI and correctly exports GenerateContentResponse, SpeechConfig, and VoiceConfig used in gai.py. However, pydub (imported on line 5 of gai.py) requires ffmpeg or libav installed at the system level. The current Dockerfile does not install either, which will cause audio conversion operations (via convert_pcm_to_mp3/convert_pcm_to_ogg) to fail at runtime. Add ffmpeg to the apt-get install step in the Dockerfile.

🤖 Prompt for AI Agents
In `@backend/pyproject.toml` around lines 39 - 40, The Docker image is missing
system-level ffmpeg/libav needed by pydub (used by convert_pcm_to_mp3 and
convert_pcm_to_ogg in gai.py), so update the Dockerfile's apt-get install line
to include ffmpeg (or libav-tools) so audio conversions succeed at runtime;
ensure the package is added to the same RUN apt-get install step and the image
build still cleans up apt caches as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants